Skip to content

geotiff: hotfix require_transform_for_georeferenced for int-coord arrays#1969

Merged
brendancol merged 1 commit into
mainfrom
hotfix-int-coord-require-transform
May 15, 2026
Merged

geotiff: hotfix require_transform_for_georeferenced for int-coord arrays#1969
brendancol merged 1 commit into
mainfrom
hotfix-int-coord-require-transform

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

PR #1954 made coords_to_transform return None for integer x/y coords (no-georef sentinel for #1949 round-trips). PR #1953 added require_transform_for_georeferenced which raises when both spatial dims are in da.coords and the resolved transform is None. Together they break every writer call that uses int coords, including the to_geotiff and VRT paths exercised by test_vrt_int_nodata_1564.py, test_metadata_round_trip_1484.py, test_georef_edges.py, and test_features.py (11 failures across the matrix).

Fix

Mirror the int/uint exemption from coords_to_transform inside require_transform_for_georeferenced. Int coord dtypes are the read-side no-georef signal; the writer should accept them silently instead of fail-closing.

Also tightens the error wording, which previously said "both axes are degenerate (1x1)" even when the array was non-degenerate.

Test plan

  • New regression test in test_int_coords_round_trip_hotfix_1962.py covers 2D and 3D int-coord round-trips.
  • Existing failing tests pass again:
    • test_features.py::TestExtraTags::test_extra_tags_round_trip
    • test_georef_edges.py::TestNonGeoreferencedRead::test_round_trip_preserves_pixels
    • test_metadata_round_trip_1484.py::TestTagPassThrough (3 tests)
    • test_vrt_int_nodata_1564.py (6 tests)
  • Broader -k "georef or coord or transform" sweep stays green.

…el parity with coords_to_transform)

PR #1953 added require_transform_for_georeferenced which raises when
both spatial dims are in da.coords and the resolved transform is None.
PR #1954 made coords_to_transform return None for integer x/y coords
as the no-georef sentinel (#1949). Their interaction broke writer
calls against int-coord arrays. Mirror the int/uint exemption from
coords_to_transform inside the guard so the writer accepts int coords
silently. Also fixes the error wording, which previously claimed
"both axes are degenerate (1x1)" even when the array was not 1x1.

Adds a regression test covering 2D and 3D int-coord round-trips.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 15, 2026
@brendancol brendancol merged commit 94c62fb into main May 15, 2026
5 of 6 checks passed
brendancol added a commit that referenced this pull request May 18, 2026
… (#2030)

* geotiff: bundle remaining ambiguous-metadata fail-closed checks (#1987)

Bundles four of the five remaining #1987 slices (PRs 2, 3, 4, 7 in the
issue's numbering). PR 6 (``ConflictingCRSError``) landed earlier as
this branch's parent. PR 5 (``MixedBandMetadataError``) is intentionally
deferred to a follow-up that also migrates ~35 existing VRT test
fixtures via the new ``band_nodata='first'`` opt-out kwarg.

Active checks added in this PR
------------------------------

* ``UnparseableCRSError`` (#1987 PR 2):
  - Write: typed the existing ``_validate_crs_fallback`` raise from
    plain ``ValueError`` to ``UnparseableCRSError``. No behaviour
    change (subclass relationship).
  - Read: new ``_check_read_unparseable_crs`` that runs ``pyproj.CRS.
    from_user_input`` on ``crs_wkt`` and raises if pyproj cannot parse.
    Tolerates pyproj-parseable placeholders like ``"EPSG:4326"`` that
    the GDAL VRT ``<SRS>`` convention stashes into ``crs_wkt``.
  - Opt-out: ``allow_unparseable_crs=True`` kwarg, threaded through
    ``open_geotiff`` / ``read_geotiff_dask`` / ``read_geotiff_gpu`` /
    ``read_vrt``.

* ``RotatedTransformError`` (#1987 PR 3):
  - Read: new ``_check_read_rotated_transform`` that rejects affine
    transforms with non-zero b / d terms (rasterio Affine ordering).
    Downstream xrspatial ops (slope, aspect, hillshade, proximity,
    zonal) assume axis-aligned grids; a rotated transform silently
    produced wrong results.
  - Opt-out: ``allow_rotated=True`` kwarg, same threading as
    ``allow_unparseable_crs``.

* ``NonUniformCoordsError`` (#1987 PR 4):
  - Write: new ``_check_write_non_uniform_coords`` that diffs the
    ``y`` and ``x`` coord arrays against the first step and rejects
    when relative drift exceeds 1e-5 (mirrors the existing #1720
    coord-regularity tolerance). The int-dtype sentinel from #1969 is
    exempted (the no-georef fallback uses 0..N-1 ints which the writer
    treats specially).
  - No new kwarg: the fix is to resample, not to opt out.

* ``ConflictingNodataError`` (#1987 PR 7):
  - Write: new ``_check_write_conflicting_nodata`` that refuses when
    ``attrs['nodata']`` disagrees with every concrete entry in
    ``attrs['nodatavals']``. ``None`` and NaN entries in the rioxarray
    tuple are skipped (same convention as ``_resolve_nodata_attr``).
    NaN as the canonical value paired with a concrete numeric in the
    tuple also raises -- "NaN is the sentinel" and "X is the sentinel"
    contradict.
  - Opt-out: explicit ``nodata=`` writer kwarg overrides both attrs
    and bypasses the check.

Shared infrastructure
---------------------

* ``_attrs.py`` gains ``_validate_read_geo_info(geo_info, *, window,
  allow_rotated, allow_unparseable_crs)`` -- one helper called from
  the four read backends so the check site is uniform.
* Each writer entry point (``to_geotiff``, ``_write_vrt_tiled``,
  ``write_geotiff_gpu``) now builds a context dict carrying
  ``crs_kwarg`` / ``attrs_crs`` / ``attrs_crs_wkt`` /
  ``nodata_kwarg`` / ``attrs_nodata`` / ``attrs_nodatavals`` /
  ``coord_y`` / ``coord_x`` and feeds it to ``validate_write_metadata``.

Deferred: MixedBandMetadataError (#1987 PR 5)
---------------------------------------------

The mixed-band check function and its constants are defined in
``_validation.py`` but the registration call is commented out. The
``band_nodata=`` kwarg is threaded through both ``read_vrt`` and
``_read_vrt_chunked`` so the follow-up PR is a one-line registration
plus the test-fixture migration. About 35 existing VRT tests would
need to opt in to ``band_nodata='first'`` to keep their legacy
assertions, which is its own commit.

Test updates
------------

* ``test_attrs_contract_aliases_1984.py::test_canonical_nodata_wins_over_aliases``
  split into a resolver-layer test (still asserts canonical wins) and a
  write-layer test (now asserts ``ConflictingNodataError``).
* ``test_nodata_attr_aliases_1582.py::test_explicit_nodata_attr_wins_over_aliases``
  updated to expect ``ConflictingNodataError`` on the disagreement and to
  show the ``nodata=`` kwarg opt-out.
* ``test_reader_kwarg_order_1935.py`` updated: the new
  ``allow_rotated`` / ``allow_unparseable_crs`` kwargs join the
  canonical order; ``band_nodata`` goes in ``read_vrt``'s
  ``allowed_tail`` since it is VRT-specific; ``read_geotiff_gpu``'s
  deprecated ``gpu`` alias moved back to the tail.
* ``test_ambiguous_metadata_hooks_1987.py`` framework tests now use an
  opaque ``_dispatch_probe`` payload key instead of ``"crs_wkt":
  "EPSG:4326"`` / ``"MALFORMED"`` placeholders that the newly registered
  ``_check_read_unparseable_crs`` would refuse.
* ``test_remaining_fail_closed_1987.py`` -- 19 new tests covering each
  active check plus the round-trip read-write contract.

Verification
------------

- ``pytest xrspatial/geotiff/tests/ -k 'not gpu and not cuda'`` --
  3013 passed (was 2994 pre-PR, +19 new).
- ``pytest xrspatial/geotiff/tests/test_remaining_fail_closed_1987.py
  -v`` -- 19 passed.

Refs #1987.

* geotiff: address review suggestions and nits on #1987 fail-closed bundle

- vrt.py: eager read_vrt now parses the VRT XML and runs the #1987
  read-side validators *before* _read_vrt_internal materialises the
  mosaic, so a rejected file fails fast instead of loading the full
  array into host memory first. The parsed VRTDataset is threaded
  through via the existing `parsed=` kwarg so the XML is parsed only
  once.
- _validation.py: extracted `_gdal_geotransform_to_affine_tuple` so
  the eager and chunked VRT call sites share one GDAL->rasterio
  reorder helper instead of duplicating the index shuffle.
- _attrs.py: documented in `_validate_read_geo_info` that the
  built transform is always axis-aligned (rotated TIFFs raise
  NotImplementedError upstream in `_geotags`), so the rotated check
  fires only on the VRT path.
- tests: added direct coverage for the row-axis rotation branch
  (GDAL GT[4] non-zero) and for the zero-step / constant-float-coord
  branch in `_check_write_non_uniform_coords`. Refactored the
  rotated-VRT fixture to take a `geo_transform` kwarg so both
  rotation axes share the same builder.
brendancol added a commit that referenced this pull request May 18, 2026
… (#2030) (#2031)

* geotiff: bundle remaining ambiguous-metadata fail-closed checks (#1987)

Bundles four of the five remaining #1987 slices (PRs 2, 3, 4, 7 in the
issue's numbering). PR 6 (``ConflictingCRSError``) landed earlier as
this branch's parent. PR 5 (``MixedBandMetadataError``) is intentionally
deferred to a follow-up that also migrates ~35 existing VRT test
fixtures via the new ``band_nodata='first'`` opt-out kwarg.

Active checks added in this PR
------------------------------

* ``UnparseableCRSError`` (#1987 PR 2):
  - Write: typed the existing ``_validate_crs_fallback`` raise from
    plain ``ValueError`` to ``UnparseableCRSError``. No behaviour
    change (subclass relationship).
  - Read: new ``_check_read_unparseable_crs`` that runs ``pyproj.CRS.
    from_user_input`` on ``crs_wkt`` and raises if pyproj cannot parse.
    Tolerates pyproj-parseable placeholders like ``"EPSG:4326"`` that
    the GDAL VRT ``<SRS>`` convention stashes into ``crs_wkt``.
  - Opt-out: ``allow_unparseable_crs=True`` kwarg, threaded through
    ``open_geotiff`` / ``read_geotiff_dask`` / ``read_geotiff_gpu`` /
    ``read_vrt``.

* ``RotatedTransformError`` (#1987 PR 3):
  - Read: new ``_check_read_rotated_transform`` that rejects affine
    transforms with non-zero b / d terms (rasterio Affine ordering).
    Downstream xrspatial ops (slope, aspect, hillshade, proximity,
    zonal) assume axis-aligned grids; a rotated transform silently
    produced wrong results.
  - Opt-out: ``allow_rotated=True`` kwarg, same threading as
    ``allow_unparseable_crs``.

* ``NonUniformCoordsError`` (#1987 PR 4):
  - Write: new ``_check_write_non_uniform_coords`` that diffs the
    ``y`` and ``x`` coord arrays against the first step and rejects
    when relative drift exceeds 1e-5 (mirrors the existing #1720
    coord-regularity tolerance). The int-dtype sentinel from #1969 is
    exempted (the no-georef fallback uses 0..N-1 ints which the writer
    treats specially).
  - No new kwarg: the fix is to resample, not to opt out.

* ``ConflictingNodataError`` (#1987 PR 7):
  - Write: new ``_check_write_conflicting_nodata`` that refuses when
    ``attrs['nodata']`` disagrees with every concrete entry in
    ``attrs['nodatavals']``. ``None`` and NaN entries in the rioxarray
    tuple are skipped (same convention as ``_resolve_nodata_attr``).
    NaN as the canonical value paired with a concrete numeric in the
    tuple also raises -- "NaN is the sentinel" and "X is the sentinel"
    contradict.
  - Opt-out: explicit ``nodata=`` writer kwarg overrides both attrs
    and bypasses the check.

Shared infrastructure
---------------------

* ``_attrs.py`` gains ``_validate_read_geo_info(geo_info, *, window,
  allow_rotated, allow_unparseable_crs)`` -- one helper called from
  the four read backends so the check site is uniform.
* Each writer entry point (``to_geotiff``, ``_write_vrt_tiled``,
  ``write_geotiff_gpu``) now builds a context dict carrying
  ``crs_kwarg`` / ``attrs_crs`` / ``attrs_crs_wkt`` /
  ``nodata_kwarg`` / ``attrs_nodata`` / ``attrs_nodatavals`` /
  ``coord_y`` / ``coord_x`` and feeds it to ``validate_write_metadata``.

Deferred: MixedBandMetadataError (#1987 PR 5)
---------------------------------------------

The mixed-band check function and its constants are defined in
``_validation.py`` but the registration call is commented out. The
``band_nodata=`` kwarg is threaded through both ``read_vrt`` and
``_read_vrt_chunked`` so the follow-up PR is a one-line registration
plus the test-fixture migration. About 35 existing VRT tests would
need to opt in to ``band_nodata='first'`` to keep their legacy
assertions, which is its own commit.

Test updates
------------

* ``test_attrs_contract_aliases_1984.py::test_canonical_nodata_wins_over_aliases``
  split into a resolver-layer test (still asserts canonical wins) and a
  write-layer test (now asserts ``ConflictingNodataError``).
* ``test_nodata_attr_aliases_1582.py::test_explicit_nodata_attr_wins_over_aliases``
  updated to expect ``ConflictingNodataError`` on the disagreement and to
  show the ``nodata=`` kwarg opt-out.
* ``test_reader_kwarg_order_1935.py`` updated: the new
  ``allow_rotated`` / ``allow_unparseable_crs`` kwargs join the
  canonical order; ``band_nodata`` goes in ``read_vrt``'s
  ``allowed_tail`` since it is VRT-specific; ``read_geotiff_gpu``'s
  deprecated ``gpu`` alias moved back to the tail.
* ``test_ambiguous_metadata_hooks_1987.py`` framework tests now use an
  opaque ``_dispatch_probe`` payload key instead of ``"crs_wkt":
  "EPSG:4326"`` / ``"MALFORMED"`` placeholders that the newly registered
  ``_check_read_unparseable_crs`` would refuse.
* ``test_remaining_fail_closed_1987.py`` -- 19 new tests covering each
  active check plus the round-trip read-write contract.

Verification
------------

- ``pytest xrspatial/geotiff/tests/ -k 'not gpu and not cuda'`` --
  3013 passed (was 2994 pre-PR, +19 new).
- ``pytest xrspatial/geotiff/tests/test_remaining_fail_closed_1987.py
  -v`` -- 19 passed.

Refs #1987.

* geotiff: address review suggestions and nits on #1987 fail-closed bundle

- vrt.py: eager read_vrt now parses the VRT XML and runs the #1987
  read-side validators *before* _read_vrt_internal materialises the
  mosaic, so a rejected file fails fast instead of loading the full
  array into host memory first. The parsed VRTDataset is threaded
  through via the existing `parsed=` kwarg so the XML is parsed only
  once.
- _validation.py: extracted `_gdal_geotransform_to_affine_tuple` so
  the eager and chunked VRT call sites share one GDAL->rasterio
  reorder helper instead of duplicating the index shuffle.
- _attrs.py: documented in `_validate_read_geo_info` that the
  built transform is always axis-aligned (rotated TIFFs raise
  NotImplementedError upstream in `_geotags`), so the rotated check
  fires only on the VRT path.
- tests: added direct coverage for the row-axis rotation branch
  (GDAL GT[4] non-zero) and for the zero-step / constant-float-coord
  branch in `_check_write_non_uniform_coords`. Refactored the
  rotated-VRT fixture to take a `geo_transform` kwarg so both
  rotation axes share the same builder.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant